Model Selection

High-fidelity Audio

# High-fidelity Audio

CSM (Conversational Speech Model) is a 1B-parameter speech generation model developed by Sesame, capable of generating RVQ audio encoding from text and audio inputs.

Speech Synthesis English

A PyTorch-based text-to-speech model supporting Chinese speech synthesis, developed and released by SesameAILabs.

Speech Synthesis

Sepformer Dns4 16k Enhancement

This is a speech enhancement model based on the SepFormer architecture, specifically designed for denoising tasks. It was trained on the Microsoft DNS-4 dataset and supports audio processing at a 16kHz sampling rate.

Audio Enhancement Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase